Imports

Task Descriptions (you can delete the description cell if you want)

.Target websit : 巴哈姆特

.Task : 

  1.  Find the top 20 hotboard (熱門看版) on the home page of 巴哈姆特.

  2.  Enter each page, find out all articles that have more than 5 GP in the first page of 文章列表.

  3.  Store the required information of each article in a dataframe.
    (Since there are 20 hotboards, you might have 20 dataframes as the result.)

  4.  Create a interactive jupyter notebook cell to display the results of your web scraping.


.Final display format (mandatory): 

  1.  The columns of each dataframe should be [ 'GP', '子版', '標題', '顯示內文' ]
  
  2.  Each dataframe should be sorted by the GP of each row (article) in a descending order.

  3.  The index of the dataframe should start from 0 in a ascending order.

  4.  The 標題 of each article should be display as a hypertext which linked to the source page of the article.

Home page of 巴哈姆特

1.png

Roll down to find the top 20 hot-board

2.png

Enter the 文章列表 of each hot-board, and find all required information

4.png

Don't panic, this task can be done by less than 50 lines of code.

The result should looks like follow:

HW_Scraping.gif

Task